Adapting Improved Upper Confidence Bounds for Monte-Carlo Tree Search
نویسندگان
چکیده
The UCT algorithm, which combines the UCB algorithm and Monte-Carlo Tree Search (MCTS), is currently the most widely used variant of MCTS. Recently, a number of investigations into applying other bandit algorithms to MCTS have produced interesting results. In this research, we will investigate the possibility of combining the improved UCB algorithm, proposed by Auer et al. [2], with MCTS. However, various characteristics and properties of the improved UCB algorithm may not be ideal for a direct application to MCTS. Therefore, some modifications were made to the improved UCB algorithm, making it more suitable for the task of game tree search. The Mi-UCT algorithm is the application of the modified UCB algorithm applied to trees. The performance of Mi-UCT is demonstrated on the games of 9× 9 Go and 9× 9 NoGo, and has shown to outperform the plain UCT algorithm when only a small number of playouts are given, and rougly on the same level when more playouts are available.
منابع مشابه
Generalized Rapid Action Value Estimation
Monte Carlo Tree Search (MCTS) is the state of the art algorithm for many games including the game of Go and General Game Playing (GGP). The standard algorithm for MCTS is Upper Confidence bounds applied to Trees (UCT). For games such as Go a big improvement over UCT is the Rapid Action Value Estimation (RAVE) heuristic. We propose to generalize the RAVE heuristic so as to have more accurate es...
متن کاملMonte-Carlo Expression Discovery
Monte-Carlo Tree Search is a general search algorithm that gives good results in games. Genetic Programming evaluates and combines trees to discover expressions that maximize a given fitness function. In this paper Monte-Carlo Tree Search is used to generate expressions that are evaluated in the same way as in Genetic Programming. Monte-Carlo Tree Search is transformed in order to search expres...
متن کاملCreating an Upper-Confidence-Tree Program for Havannah
Monte-Carlo Tree Search and Upper Confidence Bounds provided huge improvements in computer-Go. In this paper, we test the generality of the approach by experimenting on another game, Havannah, which is known for being especially difficult for computers. We show that the same results hold, with slight differences related to the absence of clearly known patterns for the game of Havannah, in spite...
متن کاملA Linear Classifier Outperforms UCT in 9x9 Go
The dominant paradigm in computer Go is Monte-Carlo Tree Search (MCTS). This technique chooses a move by playing a series of simulated games, building a search tree along the way. After many simulated games, the most promising move is played. This paper proposes replacing the search tree with a neural network. Where previous neural network Go research has used the state of the board as input, o...
متن کاملSmooth UCT Search in Computer Poker
Self-play Monte Carlo Tree Search (MCTS) has been successful in many perfect-information twoplayer games. Although these methods have been extended to imperfect-information games, so far they have not achieved the same level of practical success or theoretical convergence guarantees as competing methods. In this paper we introduce Smooth UCT, a variant of the established Upper Confidence Bounds...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015